Self organization of a massive document collection
نویسندگان
چکیده
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240-node SOM. As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.
منابع مشابه
Representation of Document Archives for Interactive Exploration
Today's information age may be characterized by constant massive production and dissemination of written information. More powerful tools for exploring, searching, and organizing the available mass of information are needed to cope with this situation. In this context the map metaphor for displaying the contents of a document archive in a two-dimensional display has gained increased interest. I...
متن کاملA quickly trainable hybrid SOM-based document organization system
The large volume of nowadays document collections has increased the need of fast trainable document organization systems. This paper presents and evaluates a hybrid system to self-organization of massive document collections based on self-organizing map (SOM). The hybrid system uses prototypes generated by a clustering algorithm to train the document maps, thus reducing the training time of lar...
متن کاملRecognition and Analysis of Massive Open Online Courses (MOOCs) Aesthetics for the Sustainable Education
The present study was conducted to recognize and analyze the Massive Open Online Course (MOOC) aesthetics for sustainable education. For this purpose, two methods of the exploratory search (qualitative) and the questionnaire (quantitative) were used for data collection. The research sample in the qualitative section included the electronic resources related to the topic and in the quantitative ...
متن کاملGeneralizability of the WEBSOM Method to Document Collections of Various Types
WEBSOM is a method in which the self-organizing map algorithm is used to automatically organize collections of documents on a map to enable easy exploration of the collection. This article illustrates with case studies how collections of various types of text can be successfully organized using the WEBSOM. The emphasis is on describing the particular challenges that each type of material poses,...
متن کاملSelf-Organizing-Map-Based Metamodeling for Massive Text Data Exploration
In this study, we describe the use of the self-organizing map (SOM) as a metamodeling technique to design a parallel text data exploration system. Firstly, the large textual collections are divided into various small data subsets. Based on the different subsets, different unitary SOM models, i.e., base models, are then trained for word clustering map. In this phase, different SOM models are imp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE transactions on neural networks
دوره 11 3 شماره
صفحات -
تاریخ انتشار 2000